feat: OTEL metrics pipeline for task workers#3061
Conversation
🦋 Changeset detectedLatest commit: a4497ed The changes in this PR will be included in the next version bump. This PR includes changesets to release 28 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds per-column display format hints (ColumnFormatType) through tsql schema and printer and introduces format-aware formatters and utilities (bytes, decimalBytes, percent, duration, cost, quantity) used in charts, legends, tooltips, tables, and big-number cards. Adds per-table timeBucketThresholds and conditional FINAL handling for time-bucketing. Implements ClickHouse metrics support (table migration, insert helper, types), OTLP metrics ingest route, an OTLP->ClickHouse exporter, and extensive OTEL metrics plumbing across SDK, CLI, workers, and environment configuration (env vars, metric exporters/readers, metric buffering/processing, machine id propagation). Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…d, decouple flushing metrics with metric bucket intervals
aa450a6 to
f3c9dfb
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (5)
internal-packages/tsql/src/query/printer.ts (3)
743-749: Theformatproperty is smuggled via a type assertion — consider a typed return.
analyzeSelectColumnattachesformatontosourceColumn(line 984), butColumnSchemadoesn't declare that field. Here you cast toPartial<ColumnSchema> & { format?: ColumnFormatType }to read it back. This coupling is fragile — a refactor of the return type could silently break the flow.Consider adding
format?: ColumnFormatTypetoanalyzeSelectColumn's return type directly (as a sibling ofsourceColumn) rather than embedding it inside the column object:♻️ Suggested approach
private analyzeSelectColumn(col: Expression): { outputName: string | null; sourceColumn: Partial<ColumnSchema> | null; inferredType: ClickHouseType | null; + format?: ColumnFormatType; } {Then in the prettyFormat branch, return
formatas a top-level field instead of merging it intosourceColumn. The consumer invisitSelectColumnWithMetadatacan then readformatdirectly without a cast.
963-977:validFormatslist can drift fromColumnFormatType— consider deriving one from the other.The hardcoded
validFormatsarray must be kept in sync withColumnFormatTypemanually. If a new format is added to the union type, forgetting to update this list would silently reject it at runtime.♻️ Suggested approach
+const PRETTY_FORMAT_TYPES = [ + "bytes", + "decimalBytes", + "quantity", + "percent", + "duration", + "durationSeconds", + "costInDollars", + "cost", +] as const satisfies readonly ColumnFormatType[]; // In analyzeSelectColumn: -const validFormats = [ - "bytes", - "decimalBytes", - "quantity", - "percent", - "duration", - "durationSeconds", - "costInDollars", - "cost", -]; -if (!validFormats.includes(formatType)) { +if (!(PRETTY_FORMAT_TYPES as readonly string[]).includes(formatType)) {Using
satisfies readonly ColumnFormatType[]ensures the compiler flags any value not in the union, catching drift at build time.
2868-2874: Format type validation is skipped whenprettyFormat()is used outside SELECT.In
visitCall, the wrapper is stripped without validating the format type string. Validation only happens inanalyzeSelectColumn(which only runs for SELECT columns). If a user writesWHERE prettyFormat(x, 'bogus') > 5, it silently passes — no error, but also no harm since the wrapper is stripped. This is likely acceptable for a display-only hint, but worth noting for consistency.apps/webapp/app/v3/querySchemas.ts (2)
575-584: Inconsistent column types forenvironment_typeandmachine_namevs.runsSchema.In
runsSchema,environment_type(line 74) andmachine(line 395) useLowCardinality(String). Here, bothenvironment_typeandmachine_name(line 568) use plainStringdespite having the same boundedallowedValues. While these are expression columns extracted from JSON, using consistent TSQL types improves schema coherence and could help the query engine optimize accordingly.Suggested fix
machine_name: { name: "machine_name", - ...column("String", { + ...column("LowCardinality(String)", { description: "The machine preset used for execution", allowedValues: [...MACHINE_PRESETS], example: "small-1x", }), expression: "attributes.trigger.machine_name", }, environment_type: { name: "environment_type", - ...column("String", { + ...column("LowCardinality(String)", { description: "Environment type", allowedValues: [...ENVIRONMENT_TYPES], customRenderType: "environmentType", example: "PRODUCTION", }), expression: "attributes.trigger.environment_type", },
485-492: Renamemetric_subjectto be more specific or document its constraints.
metric_subjectis consistently populated only withmachineIdvalues (lines 502, 527, 554 in otlpExporter.server.ts), yet the ClickHouse column name is generically scoped asmetric_subjectwhile the TSQL alias is specifically namedmachine_id. This naming mismatch could confuse developers querying ClickHouse directly or extending the schema later. Either align the ClickHouse column name tomachine_idto match the TSQL side, or add a schema comment clarifying thatmetric_subjectis constrained to machine identifiers.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
apps/webapp/app/v3/querySchemas.tsinternal-packages/tsql/src/query/printer.tsinternal-packages/tsql/src/query/schema.ts
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
internal-packages/tsql/src/query/schema.tsapps/webapp/app/v3/querySchemas.tsinternal-packages/tsql/src/query/printer.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
internal-packages/tsql/src/query/schema.tsapps/webapp/app/v3/querySchemas.tsinternal-packages/tsql/src/query/printer.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/tsql/src/query/schema.tsapps/webapp/app/v3/querySchemas.tsinternal-packages/tsql/src/query/printer.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
internal-packages/tsql/src/query/schema.tsapps/webapp/app/v3/querySchemas.tsinternal-packages/tsql/src/query/printer.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/app/v3/querySchemas.ts
apps/webapp/app/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access all environment variables through the
envexport ofenv.server.tsinstead of directly accessingprocess.envin the Trigger.dev webapp
Files:
apps/webapp/app/v3/querySchemas.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: When importing from@trigger.dev/corein the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webappAccess environment variables via
envexport fromapps/webapp/app/env.server.ts, never useprocess.envdirectly
Files:
apps/webapp/app/v3/querySchemas.ts
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Configure OpenTelemetry instrumentations and exporters in trigger.config.ts for enhanced logging
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schemaTask()` from `trigger.dev/sdk/v3` with Zod schema for payload validation
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/database/prisma/migrations/**/*.sql : When editing the Prisma schema, remove extraneous migration lines related to specific tables: `_BackgroundWorkerToBackgroundWorkerFile`, `_BackgroundWorkerToTaskQueue`, `_TaskRunToTaskRunTag`, `_WaitpointRunConnections`, `_completedWaitpoints`, `SecretStore_key_idx`, and unrelated `TaskRun` indexes
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2026-01-08T15:57:09.323Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2026-01-08T15:57:09.323Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2026-01-08T15:57:09.323Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Reference the schedule engine (internal-packages/schedule-engine/src/engine/index.ts) as a good example of implementing low-cardinality metric attributes
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2026-01-08T15:57:09.323Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to **/*.{ts,tsx} : Every Trigger.dev task must be exported and have a unique `id` property with no timeouts in the run function
Applied to files:
apps/webapp/app/v3/querySchemas.ts
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Applied to files:
apps/webapp/app/v3/querySchemas.ts
🧬 Code graph analysis (3)
internal-packages/tsql/src/query/schema.ts (1)
internal-packages/tsql/src/query/time_buckets.ts (1)
BucketThreshold(22-27)
apps/webapp/app/v3/querySchemas.ts (3)
internal-packages/tsql/src/query/schema.ts (2)
TableSchema(359-408)column(423-435)internal-packages/tsql/src/index.ts (3)
TableSchema(116-116)column(83-83)BucketThreshold(139-139)internal-packages/tsql/src/query/time_buckets.ts (1)
BucketThreshold(22-27)
internal-packages/tsql/src/query/printer.ts (4)
internal-packages/tsql/src/query/schema.ts (2)
ColumnSchema(53-254)ColumnFormatType(279-298)internal-packages/tsql/src/index.ts (3)
ColumnSchema(110-110)ColumnFormatType(112-112)calculateTimeBucketInterval(138-138)internal-packages/tsql/src/query/ast.ts (3)
Call(482-488)Constant(463-466)Field(468-472)internal-packages/tsql/src/query/time_buckets.ts (1)
calculateTimeBucketInterval(86-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: typecheck / typecheck
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (8)
internal-packages/tsql/src/query/schema.ts (2)
273-298: LGTM —ColumnFormatTypeis well-structured as a string union.Clean use of a string union type (no enum), good separation of existing render types vs. new format hints, and well-documented.
395-407: LGTM —timeBucketThresholdsanduseFinalare clean, well-documented additions.Per-table time bucket overrides and opt-in
FINALfor ReplacingMergeTree are sensible schema extensions.internal-packages/tsql/src/query/printer.ts (3)
1618-1633: LGTM — conditionalFINALbased onuseFinalis correct.Good change to make
FINALopt-in per table rather than always applied. The redundantlookupTablecall (already called earlier in the same method) is harmless since it's just a map lookup.
3052-3057: LGTM — per-table time bucket thresholds cleanly wired through.
62-62: LGTM — type-only import ofColumnFormatType.apps/webapp/app/v3/querySchemas.ts (3)
36-36: LGTM —useFinal: trueis appropriate for a ReplacingMergeTree table.This ensures deduplication is applied when querying
task_runs_v2.
602-613: Time bucket thresholds are well-structured and appropriate for 10s pre-aggregated data.The progression is sensible, and the
satisfieskeyword ensures type safety without losing the literal type. Minor note: with a 30-day TTL onmetrics_v1, the thresholds beyond 30 days (90d, 180d, 365d) won't be exercised currently, but they're harmless and provide forward-compatibility.
619-619: LGTM —metricsSchemacorrectly added to the exportedquerySchemasarray.
| { | ||
| key: "TRIGGER_OTEL_METRICS_EXPORT_INTERVAL_MILLIS", | ||
| value: env.DEV_OTEL_METRICS_EXPORT_INTERVAL_MILLIS, | ||
| }, | ||
| { | ||
| key: "TRIGGER_OTEL_METRICS_EXPORT_TIMEOUT_MILLIS", | ||
| value: env.DEV_OTEL_METRICS_EXPORT_INTERVAL_MILLIS, | ||
| } | ||
| ); |
There was a problem hiding this comment.
🟡 TRIGGER_OTEL_METRICS_EXPORT_TIMEOUT_MILLIS reuses interval value instead of timeout value
Both dev and prod metric environment variable configuration set TRIGGER_OTEL_METRICS_EXPORT_TIMEOUT_MILLIS to the same value as TRIGGER_OTEL_METRICS_EXPORT_INTERVAL_MILLIS. The export timeout and export interval serve different purposes — the timeout controls how long to wait for a single export operation, while the interval controls how often exports occur.
Affected locations and impact
Dev at apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts:973-974:
{
key: "TRIGGER_OTEL_METRICS_EXPORT_TIMEOUT_MILLIS",
value: env.DEV_OTEL_METRICS_EXPORT_INTERVAL_MILLIS, // should be its own timeout value
}Prod at apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts:1123-1124:
{
key: "TRIGGER_OTEL_METRICS_EXPORT_TIMEOUT_MILLIS",
value: env.PROD_OTEL_METRICS_EXPORT_INTERVAL_MILLIS, // should be its own timeout value
}The downstream consumer in tracingSDK.ts further clamps: Math.min(exportTimeoutMillis, collectionIntervalMs), which mitigates the impact in most configurations. However, if someone sets a short export interval (e.g., 2s), the timeout would also be 2s, which may be insufficient for slow network conditions.
Impact: Incorrect metric export timeout configuration; mitigated by downstream clamping in most cases.
Was this helpful? React with 👍 or 👎 to provide feedback.
@opentelemetry/host-metrics) and Node.js runtime metrics (event loop utilization, event loop delay, heap usage)otel.metrics.getMeter()from@trigger.dev/sdkprettyFormat()for human-readable values, and per-schema time bucket thresholdsreferences/hello-world/src/trigger/metrics.ts) demonstrating CPU-intensive, memory-ramp, bursty workload, and custom metrics patternsWhat changed
Metrics collection (packages/core, packages/cli-v3)
TracingSDKnow sets up aMeterProviderwith aPeriodicExportingMetricReaderthat chains throughTaskContextMetricExporter(adds run context attributes) andBufferingMetricExporter(batches exports to reduce overhead)@opentelemetry/host-metricsfor process CPU, memory, and system-level metricsnodejsRuntimeMetrics.tsmodule usingperformance.eventLoopUtilization(),monitorEventLoopDelay(), andprocess.memoryUsage()to emit 6 observable gaugesotel.metricsfrom@trigger.dev/sdkso users can create counters, histograms, and gauges in their taskssystem.*metrics to reduce noise, keeps sending metrics between runs in warm workersMetrics ingestion (apps/webapp)
otel.v1.metrics.tsaccepts OTEL metric export requests (JSON and protobuf), converts to ClickHouse rows016_create_metrics_v1.sqlwith 10-second aggregation buckets, JSON attributes column, 30-day TTL, and materialized views for 1m/5m rollupsQuery engine (internal-packages/tsql, apps/webapp)
task_identifier,run_id,machine_name,worker_version, etc.) extracted from the JSON attributes columnprettyFormat()— TSQL function that annotates columns with format hints (bytes,percent,durationSeconds) for frontend rendering without changing the underlying dataReference project
references/hello-world/src/trigger/metrics.ts— 6 example tasks:cpu-intensive,memory-ramp,bursty-workload,sustained-workload,concurrent-load,custom-metricsTest plan
cpu-intensive,memory-ramp, andcustom-metricstasksSELECT DISTINCT metric_name FROM metrics_v1prettyFormatrenders correctly in chart tooltips and table cellssystem.*metrics but keepsprocess.*andnodejs.*